Machine Learning Analysis Report

Generated on August 03, 2025 at 09:07 PM

Machine Learning Analysis Pipeline

EDR: Dataset Loading & Preprocessing

EDR – Train/Test Overview
• Train shape: (17536, 20) | Test shape: (1535, 20)
• Total train samples: 17,536 | Total test samples: 1,535
• Number of features: 16
• Target column: 'label'
• Missing values (train): 0 | (test): 0
EDR – Train Class Distribution
• 0: 16,679
• 1: 857
• Class balance (minority/majority): 5.1382%
EDR – Feature Preparation
• Target encoding: {0: 0, 1: 1}
• Data preprocessing: Infinite values handled, missing values filled with train medians
• Feature scaling: StandardScaler (fit on train, applied to test)
Baseline (Most-Frequent) Accuracy: 0.9518

EDR: Model Performance Comparison

EDR – Model Performance Metrics

ModelAccuracyBalanced AccPrecisionRecallF1ROC-AUCPR-AUC
Logistic Regression0.90290.63470.20000.33780.25130.62740.1411
Random Forest (SMOTE)0.83060.64800.13100.44590.20250.81300.2174
LightGBM0.80910.68160.13380.54050.21450.83570.2646
Balanced RF0.86250.68400.17220.48650.25440.84240.2339
SGD SVM0.73220.53860.06230.32430.1046nannan
IsolationForest0.91920.56630.17110.17570.1733nannan

Confusion Matrix Analysis

ModelTNFPFNTPFP RateMiss Rate
Logistic Regression136110049256.84%66.22%
Random Forest (SMOTE)1242219413314.99%55.41%
LightGBM1202259344017.73%45.95%
Balanced RF1288173383611.84%51.35%
SGD SVM1100361502424.71%67.57%
IsolationForest13986361134.31%82.43%

Best Models by Metric

Accuracy
IsolationForest
0.9192
Balanced Acc
Balanced RF
0.6840
Precision
Logistic Regression
0.2000
Recall
LightGBM
0.5405
F1
Balanced RF
0.2544
ROC-AUC
Balanced RF
0.8424
PR-AUC
LightGBM
0.2646
Lowest False Positive Rate
IsolationForest
4.31%
Lowest Miss Rate
LightGBM
45.95%

EDR – Metrics by Model

EDR – Metrics by Model

EDR – ROC Curves

EDR – ROC Curves

EDR – Precision–Recall Curves

EDR – Precision–Recall Curves

EDR – Predicted Probability Distributions

EDR – Predicted Probability Distributions

EDR – Threshold Sweep

EDR – Threshold Sweep

EDR: Logistic Regression – Detailed Analysis

EDR – Logistic Regression: Confusion Matrix

EDR – Logistic Regression: Confusion Matrix

EDR – Logistic Regression: Confusion Matrix

EDR – Logistic Regression: Classification Report

Modelprecisionrecallf1support
00.96520.93160.94811461.0000
10.20000.33780.251374.0000
accuracynannan0.90291535.0000

EDR – Logistic Regression: Feature Importance

EDR – Logistic Regression: Feature Importance

EDR – Logistic Regression: Feature Importance

EDR: Random Forest (SMOTE) – Detailed Analysis

EDR – Random Forest (SMOTE): Confusion Matrix

EDR – Random Forest (SMOTE): Confusion Matrix

EDR – Random Forest (SMOTE): Confusion Matrix

EDR – Random Forest (SMOTE): Classification Report

Modelprecisionrecallf1support
00.96800.85010.90521461.0000
10.13100.44590.202574.0000
accuracynannan0.83061535.0000

EDR – Random Forest (SMOTE): Feature Importance

EDR – Random Forest (SMOTE): Feature Importance

EDR – Random Forest (SMOTE): Feature Importance

EDR: LightGBM – Detailed Analysis

EDR – LightGBM: Confusion Matrix

EDR – LightGBM: Confusion Matrix

EDR – LightGBM: Confusion Matrix

EDR – LightGBM: Classification Report

Modelprecisionrecallf1support
00.97250.82270.89141461.0000
10.13380.54050.214574.0000
accuracynannan0.80911535.0000

EDR – LightGBM: Feature Importance

EDR – LightGBM: Feature Importance

EDR – LightGBM: Feature Importance

EDR: Balanced RF – Detailed Analysis

EDR – Balanced RF: Confusion Matrix

EDR – Balanced RF: Confusion Matrix

EDR – Balanced RF: Confusion Matrix

EDR – Balanced RF: Classification Report

Modelprecisionrecallf1support
00.97130.88160.92431461.0000
10.17220.48650.254474.0000
accuracynannan0.86251535.0000

EDR – Balanced RF: Feature Importance

EDR – Balanced RF: Feature Importance

EDR – Balanced RF: Feature Importance

EDR: SGD SVM – Detailed Analysis

EDR – SGD SVM: Confusion Matrix

EDR – SGD SVM: Confusion Matrix

EDR – SGD SVM: Confusion Matrix

EDR – SGD SVM: Classification Report

Modelprecisionrecallf1support
00.95650.75290.84261461.0000
10.06230.32430.104674.0000
accuracynannan0.73221535.0000

EDR – SGD SVM: Feature Importance

EDR – SGD SVM: Feature Importance

EDR – SGD SVM: Feature Importance

EDR: IsolationForest – Detailed Analysis

EDR – IsolationForest: Confusion Matrix

EDR – IsolationForest: Confusion Matrix

EDR – IsolationForest: Confusion Matrix

EDR – IsolationForest: Classification Report

Modelprecisionrecallf1support
00.95820.95690.95751461.0000
10.17110.17570.173374.0000
accuracynannan0.91921535.0000

EDR – IsolationForest: Feature Importance

Feature importance not available for this model type.

XDR: Dataset Loading & Preprocessing

XDR – Train/Test Overview
• Train shape: (17536, 34) | Test shape: (1535, 34)
• Total train samples: 17,536 | Total test samples: 1,535
• Number of features: 30
• Target column: 'label'
• Missing values (train): 0 | (test): 0
XDR – Train Class Distribution
• 0: 16,679
• 1: 857
• Class balance (minority/majority): 5.1382%
XDR – Feature Preparation
• Target encoding: {0: 0, 1: 1}
• Data preprocessing: Infinite values handled, missing values filled with train medians
• Feature scaling: StandardScaler (fit on train, applied to test)
Baseline (Most-Frequent) Accuracy: 0.9518

XDR: Model Performance Comparison

XDR – Model Performance Metrics

ModelAccuracyBalanced AccPrecisionRecallF1ROC-AUCPR-AUC
Logistic Regression0.89320.58470.14290.24320.18000.61220.1368
Random Forest (SMOTE)0.86120.67690.16750.47300.24730.81310.2234
LightGBM0.90160.67250.22300.41890.29110.84890.2725
Balanced RF0.87950.69290.19670.48650.28020.83970.2397
SGD SVM0.89320.58470.14290.24320.1800nannan
IsolationForest0.94070.54550.24240.10810.1495nannan

Confusion Matrix Analysis

ModelTNFPFNTPFP RateMiss Rate
Logistic Regression135310856187.39%75.68%
Random Forest (SMOTE)1287174393511.91%52.70%
LightGBM135310843317.39%58.11%
Balanced RF1314147383610.06%51.35%
SGD SVM135310856187.39%75.68%
IsolationForest1436256681.71%89.19%

Best Models by Metric

Accuracy
IsolationForest
0.9407
Balanced Acc
Balanced RF
0.6929
Precision
IsolationForest
0.2424
Recall
Balanced RF
0.4865
F1
LightGBM
0.2911
ROC-AUC
LightGBM
0.8489
PR-AUC
LightGBM
0.2725
Lowest False Positive Rate
IsolationForest
1.71%
Lowest Miss Rate
Balanced RF
51.35%

XDR – Metrics by Model

XDR – Metrics by Model

XDR – ROC Curves

XDR – ROC Curves

XDR – Precision–Recall Curves

XDR – Precision–Recall Curves

XDR – Predicted Probability Distributions

XDR – Predicted Probability Distributions

XDR – Threshold Sweep

XDR – Threshold Sweep

XDR: Logistic Regression – Detailed Analysis

XDR – Logistic Regression: Confusion Matrix

XDR – Logistic Regression: Confusion Matrix

XDR – Logistic Regression: Confusion Matrix

XDR – Logistic Regression: Classification Report

Modelprecisionrecallf1support
00.96030.92610.94291461.0000
10.14290.24320.180074.0000
accuracynannan0.89321535.0000

XDR – Logistic Regression: Feature Importance

XDR – Logistic Regression: Feature Importance

XDR – Logistic Regression: Feature Importance

XDR: Random Forest (SMOTE) – Detailed Analysis

XDR – Random Forest (SMOTE): Confusion Matrix

XDR – Random Forest (SMOTE): Confusion Matrix

XDR – Random Forest (SMOTE): Confusion Matrix

XDR – Random Forest (SMOTE): Classification Report

Modelprecisionrecallf1support
00.97060.88090.92361461.0000
10.16750.47300.247374.0000
accuracynannan0.86121535.0000

XDR – Random Forest (SMOTE): Feature Importance

XDR – Random Forest (SMOTE): Feature Importance

XDR – Random Forest (SMOTE): Feature Importance

XDR: LightGBM – Detailed Analysis

XDR – LightGBM: Confusion Matrix

XDR – LightGBM: Confusion Matrix

XDR – LightGBM: Confusion Matrix

XDR – LightGBM: Classification Report

Modelprecisionrecallf1support
00.96920.92610.94711461.0000
10.22300.41890.291174.0000
accuracynannan0.90161535.0000

XDR – LightGBM: Feature Importance

XDR – LightGBM: Feature Importance

XDR – LightGBM: Feature Importance

XDR: Balanced RF – Detailed Analysis

XDR – Balanced RF: Confusion Matrix

XDR – Balanced RF: Confusion Matrix

XDR – Balanced RF: Confusion Matrix

XDR – Balanced RF: Classification Report

Modelprecisionrecallf1support
00.97190.89940.93421461.0000
10.19670.48650.280274.0000
accuracynannan0.87951535.0000

XDR – Balanced RF: Feature Importance

XDR – Balanced RF: Feature Importance

XDR – Balanced RF: Feature Importance

XDR: SGD SVM – Detailed Analysis

XDR – SGD SVM: Confusion Matrix

XDR – SGD SVM: Confusion Matrix

XDR – SGD SVM: Confusion Matrix

XDR – SGD SVM: Classification Report

Modelprecisionrecallf1support
00.96030.92610.94291461.0000
10.14290.24320.180074.0000
accuracynannan0.89321535.0000

XDR – SGD SVM: Feature Importance

XDR – SGD SVM: Feature Importance

XDR – SGD SVM: Feature Importance

XDR: IsolationForest – Detailed Analysis

XDR – IsolationForest: Confusion Matrix

XDR – IsolationForest: Confusion Matrix

XDR – IsolationForest: Confusion Matrix

XDR – IsolationForest: Classification Report

Modelprecisionrecallf1support
00.95610.98290.96931461.0000
10.24240.10810.149574.0000
accuracynannan0.94071535.0000

XDR – IsolationForest: Feature Importance

Feature importance not available for this model type.